In this analysis, we aim to explore and visualize the top songs based on their streaming numbers across different platforms. We will particularly focus on identifying the top 10 songs released in 2023 and comparing them with the overall top 10 songs by streams. This analysis is crucial for understanding trends in music popularity and how recent releases stack up against all-time hits.

Load Required Libraries

First, we load the necessary libraries. We use readr for reading the dataset, dplyr for data manipulation, ggplot2 for visualization, and plotly for creating interactive plots.

We filter the dataset to select the top 10 songs released in the year 2023. This helps us focus on the most popular recent releases.

top_10_songs_released_2023 <- data %>%
  filter(released_year == 2023) %>%
  arrange(desc(streams)) %>%
  head(10) %>%
  select(track_name, artist_name, streams)
top_10_songs_released_2023
##                               track_name                artist_name    streams
## 1                                Flowers                Miley Cyrus 1316855716
## 2                        Ella Baila Sola Eslabon Armado, Peso Pluma  725980112
## 3  Shakira: Bzrp Music Sessions, Vol. 53          Shakira, Bizarrap  721975598
## 4                                    TQG           Karol G, Shakira  618990393
## 5                        La Bebe - Remix      Peso Pluma, Yng Lvcas  553634067
## 6                    Die For You - Remix  Ariana Grande, The Weeknd  518745108
## 7                              un x100to  Bad Bunny, Grupo Frontera  505671438
## 8                      Cupid - Twin Ver.                Fifty Fifty  496795686
## 9                                    PRC  Natanael Cano, Peso Pluma  436027885
## 10                                   OMG                   NewJeans  430977451

Here, we select the overall top 10 songs by their number of streams. This gives us an insight into the most popular songs regardless of their release year.

top_10_songs_2023 <- data %>%
  arrange(desc(streams)) %>%
  head(10) %>%
  select(track_name, released_year, artist_name, streams)
top_10_songs_2023
##                                       track_name released_year
## 1                                Blinding Lights          2019
## 2                                   Shape of You          2017
## 3                              Someone You Loved          2018
## 4                                   Dance Monkey          2019
## 5  Sunflower - Spider-Man: Into the Spider-Verse          2018
## 6                                      One Dance          2016
## 7                      STAY (with Justin Bieber)          2021
## 8                                       Believer          2017
## 9                                         Closer          2016
## 10                                       Starboy          2016
##                     artist_name    streams
## 1                    The Weeknd 3703895074
## 2                    Ed Sheeran 3562543890
## 3                 Lewis Capaldi 2887241814
## 4                   Tones and I 2864791672
## 5         Post Malone, Swae Lee 2808096550
## 6           Drake, WizKid, Kyla 2713922350
## 7  Justin Bieber, The Kid Laroi 2665343922
## 8               Imagine Dragons 2594040133
## 9      The Chainsmokers, Halsey 2591224264
## 10        The Weeknd, Daft Punk 2565529693

We combine the two datasets—top 10 songs released in 2023 and overall top 10 songs. This allows us to compare the two sets of songs in a single visualization. We use ggplot2 to create a base plot and then convert it to an interactive plot using plotly. The plot shows the top 10 songs by streams, distinguishing between songs released in 2023 and the overall top 10 songs. Hovering over a point displays the song name and artist name.

top_10_songs_released_2023$type <- "Released in 2023"
top_10_songs_2023$type <- "Top 10 Overall"


combined_data <- bind_rows(top_10_songs_released_2023, top_10_songs_2023)

combined_data <- combined_data %>%
  group_by(type) %>%
  mutate(position = row_number())

p <- ggplot(combined_data, aes(x = position, y = streams/100000, group = type, color = type, linetype = type, text = paste("Track Name:", track_name, "<br>Artist:", artist_name))) +
  geom_line(size = 1.5) +
  geom_point(size = 3) +
  scale_x_continuous() +
  labs(title = "Top 10 Songs by Streams",
       x = "Track Position",
       y = "Number of Streams",
       color = "Category",
       linetype = "Category") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "bottom")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
ggplotly(p, tooltip = "text")

This line plot visualizes the streaming numbers of the top 10 songs in 2023 compared to the overall top 10 songs. The interactive plot allows for an in-depth comparison by showing detailed information on hover.

Analyze and Compare Top and Next 10 Songs We then compare the top 10 songs with the next 10 songs in terms of streaming numbers and audio features. We use ggplot2 to create bar plots and box plots for visualization.

top_10_songs <- data %>%
  arrange(desc(streams)) %>%
  head(10)

# Filter for next 10 songs
next_10_songs <- data %>%
  arrange(desc(streams)) %>%
  slice(11:20)

# Combine the two datasets
top_10_songs$type <- "Top 10"
next_10_songs$type <- "Next 10"

combined_data <- bind_rows(top_10_songs, next_10_songs)

# Plot streaming numbers
p_streams <- ggplot(combined_data, aes(x = reorder(track_name, -streams), y = streams/1000000, fill = type)) +
  geom_bar(stat = "identity", position = position_dodge()) +
  labs(title = "Streaming Numbers for Top 10 and Next 10 Songs",
       x = "Track Name",
       y = "Number of Streams (in millions)",
       fill = "Category") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Plot audio features
audio_features <- combined_data %>%
  select(track_name, type, danceability_ = danceability_., valence_ = valence_., energy_ = energy_., acousticness_ = acousticness_., instrumentalness_ = instrumentalness_., liveness_ = liveness_., speechiness_ = speechiness_.) %>%
  gather(feature, value, -track_name, -type)

p_features <- ggplot(audio_features, aes(x = feature, y = value, fill = type)) +
  geom_boxplot() +
  labs(title = "Audio Features Comparison for Top 10 and Next 10 Songs",
       x = "Audio Feature",
       y = "Value",
       fill = "Category") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Convert to interactive plots
ggplotly(p_streams)
ggplotly(p_features)

The bar plot displays the streaming numbers for the top 10 and next 10 songs, while the box plot compares key audio features between these two groups. Both plots are converted to interactive formats for better analysis.

Feedback: Roommates suggested that adding labels to the lines in the plot could help differentiate between categories more clearly. They also recommended adding explanations for why certain audio features might be relevant to streaming numbers and suggesting a comparison of audio feature distributions.